Rapid intra-host diversification and evolution of SARS-CoV-2 in advanced HIV infection

Previous studies have linked the evolution of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) genetic variants to persistent infections in people with immunocompromising conditions, but the processes responsible for these observations are incompletely understood. Here we use high-throughput, single-genome amplification and sequencing (HT-SGS) to sequence SARS-CoV-2 spike genes from people with HIV (PWH, n = 22) and people without HIV (PWOH, n = 25). In PWOH and PWH with CD4 T cell counts (i.e., CD4 counts) ≥ 200 cells/μL, we find that most SARS-CoV-2 genomes sampled in each person share one spike sequence. By contrast, in people with advanced HIV infection (i.e., CD4 counts < 200 cells/μL), HT-SGS reveals a median of 46 distinct linked groupings of spike mutations per person. Elevated intra-host spike diversity in people with advanced HIV infection is detected immediately after COVID-19 symptom onset, and early intra-host spike diversity predicts SARS-CoV-2 shedding duration among PWH. Analysis of longitudinal timepoints reveals rapid fluctuations in spike sequence populations, replacement of founder sequences by groups of new haplotypes, and positive selection at functionally important residues. These findings demonstrate remarkable intra-host genetic diversity of SARS-CoV-2 in advanced HIV infection and suggest that adaptive intra-host SARS-CoV-2 evolution in this setting may contribute to the emergence of new variants of concern.

For all statistical analyses, confirm that the following items are present in in the figure legend, table legend, main text, or or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of of all covariates tested
A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g.means) or or other basic estimates (e.g.regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g.confidence intervals) For null hypothesis testing, the test statistic (e.g.F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g.Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in in published literature, software must be be made available to to editors and reviewers.We We strongly encourage code deposition in in a community repository (e.g.GitHub).See the Nature Portfolio guidelines for submitting code & software for further information.CCS were demultiplexed with lima (v.2.5.1).Reads were oriented with vsearch (v.2.21.1).Cutadapt (v.4.1) was used to to trim primer sequences and perform length filtering.Python scripts available in in UMI-pacbio-pipeline (v.1.1; https://github.com/niaid/UMI-pacbio-pipeline/releases/tag/SC2-HIV-demo) were used to to parse and bin reads based on on their inferred UMI sequence.UMI bins were clustered with vsearch and consensus single-genome sequences were generated with minimap2 (v.2.24) and bcftools (v.1.13).Non-SARS-CoV-2 sequences were removed with blastn (v.2.9.0).Python scripts in in UMI-pacbio-pipeline were used to to filter final sequences and call haplotypes.Haplotype entropy was computed with a Perl script, and Jensen-Shannon distance was computed via the 'jensenshannon' method in in SciPy (v.1.8.1).Recombinant haplotypes were identified with 3SEQ (v.1.8.0).To To perform phylogenetic inference, indels were encoded via 2matrix (v.1.0) and ML ML tree reconstruction with bootstrapping was performed with iqtree (v.1.6.12).Phylogenetic clades were identified with TreeCluster (v.1.0.3).Hypothesis testing of of selection was performed with FUBAR in in HyPhy (v.2.5.46).GraphPad Prism (v.9.4.0) was used for statistical testing and plotting of of datapoints.Phylogenetic trees were plotted with EvolView (v.We We require information from authors about some types of of materials, experimental systems and methods used in in many studies.Here, indicate whether each material, system or or method listed is is relevant to to your study.If If you are not sure if if a list item applies to to your research, read the appropriate section before selecting a response.HT-SGS for SARS-CoV-2 spike gene was performed twice for 128 samples which had high viral load, whereas the remaining samples were sequenced once due to to insufficient amounts of of viral RNA.In In the test of of HT-SGS performance with SARS-CoV-2 spike gene from various studies, repeating the runs produced reproducible sequencing data.
No No randomization was performed.We We analysed all available data.
No No blinding was performed.We We analysed all available data.CR3022 -produced by by South African Medical Research Council Antibody Immunity Research Unit; working dilution 10 10 ug/mL Palivizumab -produced by by Medimmune, RRID: AB_2459638; working dilution 10 10 ug/mL Anti-human-horseradish-peroxidase -produced by by Merck, catalog number A0170-1 Antibodies were cloned, expressed and quality controlled according to to published IC50 or or EC50 data.
Describe the methods by which all novel plant genotypes were produced.This includes those generated by transgenic approaches, gene editing, chemical/radiation-based mutagenesis and hybridization.For transgenic lines, describe the transformation method, the number of independent lines analyzed and the generation upon which experiments were performed.For gene-edited lines, describe the editor used, the endogenous sequence targeted for editing, the targeting guide RNA sequence (if applicable) and how the editor was applied.was applied.was applied.
Report on the source of all seed stocks or other plant material used.If applicable, state the seed stock centre and catalogue number.If plant specimens were collected from the field, describe the collection location, date and sampling procedures.
Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to Describe any authentication procedures for each seed stock used or novel genotype generated.Describe any experiments used to assess the effect of a mutation and, where applicable, how potential secondary effects (e.g.second site T-DNA insertions, mosiacism, off-target gene editing) were examined.
amplicon sequences were collected on on a Pacific Biosciences Sequel II II sequencer using a 20-hour movie time under circular consensus sequencing (CCS) mode with SMRT Link (v.11.0.0.146107).